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■ Abstract. In this paper, we relate the problem of finding a maximum clique to the inter- 
section number of the input graph (i.e. the minimum number of cliques needed to edge cover 

, the graph). In particular, we consider the maximum clique problem for graphs with small 

■ intersection number and random intersection graphs (a model in which each one of m labels 
$H ' is chosen independently with probability p by each one of n vertices, and there are edges 

between any vertices with overlaps in the labels chosen). 
' We first present a simple algorithm which, on input G finds a maximum clique in 0(2^ +0{m)_|_ 

min{2"', n}) time steps, where m is an upper bound on the intersection number and n is 
the number of vertices. Consequently, when m < In Inn the running time of this algorithm 
is polynomial. 

We then consider random instances of the random intersection graphs model as input graphs. 
As our main contribution, we prove that, when the number of labels is not too large (m = 
n",0 < a < 1), we can use the label choices of the vertices to find a maximum clique in 
polynomial time whp. The proof of correctness for this algorithm relies on our Single Label 
[/3 ■ Clique Theorem, which roughly states that whp a "large enough" clique cannot be formed 

O ' by more than one label. This theorem generalizes and strengthens other related results in 

the state of the art, but also broadens the range of values considered (see e.g. [20] and [3]). 
As an important consequence of our Single Label Clique Theorem, we prove that the prob- 
^ I lem of inferring the complete information of label choices for each vertex from the resulting 

. random intersection graph (i.e. the label representation of the graph) is solvable whp; namely, 

1/^ ' the maximum likelihood estimation method will provide a unique solution (up to permuta- 

, tions of the labels). Finding efficient algorithms for constructing such a label representation 

■ is left as an interesting open problem for future research. 

o 

■ 1 Introduction 

A clique in an undirected graph G is a subset of vertices any two of which are connected 
IT^i • by an edge. The cardinaUty of the maximum chque is called the clique number of G. The 

^ [ problem of finding the maximum clique in an arbitrary graph is fundamental in Theoretical 

Computer Science and appears in many different settings. As an example, consider a 
social network where vertices represent people and edges represent mutual acquaintance. 
Finding a maximum clique in this network corresponds to finding the largest subset of 
people who all know each other. More generally, the analysis of large networks in order 
to identify communities, clusters, and other latent structure has come to the forefront of 
much research. The Internet, social networks, bibliographic databases, energy distribution 
networks, and global networks of economies are some of the examples motivating the 
development of the field. 

It is well known that determining the clique number of an arbitrary graph is NP- 
complete [15]. In fact, the fastest algorithm known today runs in time 0(1.1888"") jl8) . 
where n is the number of vertices in the graph. Moreover, the best known approximation 
algorithm for the clique number has a performance guarantee of O | "(j^°g*°g") | jj^j (there 



(logn)3 

are algorithms with better approximation ratios for graphs with large clique number; 
see e.g. [1]). Even though this approximation ratio appears to be weak at first glance. 



there are several results on hardness of approximation which suggest that there can be no 
approximation algorithm with an approximation ratio significantly less than linear (see 
e.g. |10j). It was also shown in [3] that, if k is the clique number, then the clique problem 
cannot be solved in time n°^^\ unless the exponential time hypothesis fails (note that the 
brute force search algorithm runs in time 0{n^k'^), which seems quite close). 

The intractability of the maximum clique problem for arbitrary graphs lead researchers 
to the study of the problem for appropriately generated random graphs. In particular, 
for Erdos-Renyi random graphs G_ i (i.e. random graphs in which each edge appears 

independently with probability ^), there are several greedy algorithms that find a clique 
of size about Inn with high probability (whp, i.e. with probability that tends to 1 as n goes 
to infinity), see e.g. (HUH]. Since the clique number of i is asymptotically equal to 2 In n 
with high probability, these algorithms approximate the clique number by a factor of 2. In 
fact, it was conjectured that finding a clique of size (1 + e) In n (for a constant e > 0), with 
probability at least ^, would require techniques beyond the current limits of complexity 
theory. This belief was strengthened by the fact that the Metropolis algorithm also fails 
to find the maximum clique in i (see jlTj ) . A more dramatized version of the above 
conjecture was presented in [11], stating that the problem of finding an 1.01 Inn clique 
remains hard even if the input graph is a G„ i random graph in which we have planted a 

randomly chosen clique of size nP''^^. This conjecture has some interesting cryptographic 
consequences, as shown in [12]. It also seems tight, since finding the maximum clique in 
the case where the planted clique has size at least y/n can be done in polynomial time by 
using spectral properties of the adjacency matrix of the graph (see [2]). We finally note 
that there are quite a few nice results concerning generalizations of the planted clique 
problem in various (quite general) random graphs models (see e.g. [SJE]). 

1.1 Our Contribution 

In this work, we complement the state of the art by relating the maximum clique problem 
to the intersection number of the input graph G (i.e. the minimum number of cliques that 
can edge cover G). In particular, we consider the maximum clique problem for graphs with 
small intersection number and random intersection graphs. 

More analytically, we begin by considering arbitrary graphs with small intersection 
number. We present a simple algorithm which, on input G finds a maximum clique in 
Q^2'^"^+Oim) _j_ ^2 ]XLin{2'", n}) time steps, where m is an upper bound on the intersection 
number of G and n is the number of vertices. Consequently, when m < In Inn the running 
time of this algorithm is polynomial. We note here that computing the exact value of the 
independence number of G is itself an NP-complete problem, but this knowledge is only 
needed in the analysis of the algorithm. 

We then consider random instances of the random intersection graphs model (intro- 
duced in [ISlEO]) as input graphs. In this model, denoted by Gn,m,p, each one of m labels 
is chosen independently with probability p by each one of n vertices, and there are edges 
between any vertices with overlaps in the labels chosen. Random intersection graphs are 
relevant to and capture quite nicely social networking. Indeed, a social network is a struc- 
ture made of nodes (individuals or organizations) tied by one or more specific types of 
interdependency, such as values, visions, financial exchange, friends, confiicts, web links 
etc. Social network analysis views social relationships in terms of nodes and ties. Nodes 
are the individual actors within the networks and ties are the relationships between the 
actors. Other applications include oblivious resource sharing in a (general) distributed 
setting, efficient and secure communication in sensor networks |16] . interactions of mobile 



2 



agents traversing the web etc. Even epidemiological phenomena (like spread of disease) 
tend to be more accurately captured by this "interaction-sensitive" random graph model. 

As our main contribution, we prove that, when the number of labels is not too large, 
we can use the label choices of the vertices to find a maximum clique in polynomial time 
(in the number of labels m and vertices n of the graph). Most of the work in this paper is 
devoted in proving our Single Label Clique Theorem (Theorem [3] in Section Our proof 
technique is original and employs a probabilistic contradiction argument. The theorem 
states that when the number of labels is less than the number of vertices, any large enough 
clique in a random instance of Qn,m,p is formed by a single label. This statement may seem 
obvious when p is small, but it is hard to imagine that it still holds for all "interesting" 

values for p (see also the discussion in Section [5]). Indeed, when p = o , by slightly 

modifying an argument of [3], we can see that Gn,m,p almost surely has no cycle of size 
A: > 3 whose edges are formed by k distinct labels (alternatively, the intersection graph 
produced by reversing the roles of labels and vertices is a tree). On the other hand, for 
larger p a random instance of Qn,m,p is far from perfeclH and the techniques of ^ do not 
apply (for a more thorough discussion see the beginning of Section . By using the Single 
Label Clique Theorem, we provide a tight bound on the clique number of Gn,m,p when 
m = n", a < 1. A lower bound in the special case where mp'^ is constant, was given in [20]. 
We considerably broaden this range of values to also include vanishing values for mp'^ and 
also provide an asymptotically tight upper bound. 

We claim that our proof also applies for a < 2, provided p is not too small. We 
should note here that in [8j the authors prove the equivalence (measured in terms of total 
variation distance) of random intersection graphs and Erdos-Renyi random graphs, when 
m = n",a > 6. This bound on the number of labels was improved in [19], by showing 
equivalence of sharp threshold functions among the two models for a > 3. In view of these 
results, we expect that our work will shed light also in the problem of finding maximum 
cliques in Erdos-Renyi random graphs. 

Finally, as yet another consequence of our Single Label Clique Theorem, we prove that 
the problem of inferring the complete information of label choices for each vertex from the 
resulting random intersection graph (i.e. the label representation of the graph) is solvable 
whp; namely, the maximum likelihood estimation method will provide a unique solution 
(up to permutations of the labels) In particular, given values m,n and p, such that 
m = n",0 < a < 1, and given a random instance of the Gn,m,p model, the label choices 
for each vertex are uniquely defined. Finding efficient algorithms for constructing such a 
label representation is left as an open problem for future research. 

1.2 Organization of the paper 

In Section [2] we formally define random intersection graphs. We also provide some useful 
definitions and notation which are used throughout the paper. The relation of the intersec- 
tion number to the clique number of an arbitrary graph is discussed in Section [3l Section 
U] is devoted to the proof of our Single Label Clique Theorem for random intersection 
graphs. The consequences of our main theorem concerning the efficient construction of a 

perfect graph is a graph in which the chromatic number of every induced subgraph equals the size 
of the largest clique of that subgraph. Consequently, the clique number of a perfect graph is equal to its 
chromatic number. 

*More precisely, if B is the set of different label choices that can give rise to a graph G, then the 
problem of inferring the complete information of label choices from G is solvable if there is some B* € B 
such that Pr(B*|G) > Pr(B|G), for all B B B ^ B* . 
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maximum clique and the uniqueness of the label representation of Gn,m,p are presented in 
Section [3 Finally, we discuss the presented results and further research in Section [H 

2 Definitions and Preliminaries 

The formal definition of the random intersection graphs model is as follows: 

Definition 1 (Random Intersection Graph - Qn,m,p [131120] ). Consider a universe 
Ai = {1,2,..., m} of elements and a set of n vertices V . Assign independently to each 
vertex v £ V a subset of M, choosing each element i G Ai independently with probability 
p and draw an edge between two vertices v ^ u if and only if Sy H Su ^ 9- The resulting 
graph is an instance Gn,m,p of the random intersection graphs model. 

In this model we also denote by Lj the set of vertices that have chosen label i € M. 
Given Gn,m,p, we will refer to {Li, i £ M.} as its label representation. Consider the bipartite 
graph with vertex set V U Ai and edge set {{v,i) : i € S^} = {{v,i) : v € Li}. We will 
refer to this graph as the bipartite random graph Bn,m,p associated to Gn,m,p- Notice that 
the associated bipartite graph is uniquely defined by the label representation. 

It follows from the definition of the model that the edg GS in Gn.m.p 

are not independent. 

In particular, the (unconditioned) probability that a specific edge exists is 1 — (1 — p^)™'. 
Therefore, if mp^ goes to infinity with n, then this probability goes to 1. In the paper, we 
will thus consider the "interesting" range of values mp'^ = 0(1) (i.e. the range of values for 
which the unconditioned probability that an edge exists does not go to 1). Furthermore, 
as is usual in the literature, we will assume that the number of labels is some power of the 
number of vertices, i.e. m = n°, for some a > 0. 
The following definitions will also be useful: 

Definition 2 (Intersection number). The intersection number of a graph G is the 
smallest number of cliques needed to cover all of the edges of G. 

Equivalently, the intersection number is the smallest number of elements in a representation 
of G as an intersection graph of finite sets. 

Definition 3 (Edge clique cover). A set of cliques C = {Gi, . . . , Cm} is an edge clique 
cover of a graph G = {V, E) if for every edge e £ E there is at least one clique Gi such 
that e G Gi and for every non edge e' ^ E, there is no such clique in C. 

Therefore, the intersection number of G is the minimum m such that C = {Ci, . . . , Cm} is 
an edge clique cover of G. 

2.1 Notation 

We use the convention that the random intersection graphs model is denoted by Gn,m,,p 
(i.e. with a calligraph G), while a specific random instance of the model is denoted by 
Gn,m,p (i-e. with a simple G). 

For a vertex v £ V, we denote by Ng{v) the set of neighbors of v in G. We will say 
that two vertices v,u £ V belong to the same closed neighborhood in G and we will write 
V -fr^G u if and only if Ng{v) U {v} = Ng{u) U {u}. 

Let C denote a partition of the vertex set of a graph G and let v £ V. We will 
denote by C'[v] the unique set inside C that contains v, that is C'[v] = {C £ C : v £ G'}. 

Throughout the paper, we make use of the well known asymptotic notation O(-), o(-) 
and uj{-). Furthermore, we use the relation "~" for asymptotically equal. In particular, 
if f{n),g{n) are two functions of n, then /(n) ~ g{n) means that lim„_).oo = 1 or 
equivalently f{n) = g{n) + o{g{n)). 
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3 An Algorithm for Maximum Clique 

In this section we consider arbitrary graphs as input graphs for the maximum clique prob- 
lem. In particular, we relate the running time of the following algorithm to the intersection 
number of the input graph G. 

Algorithm FIND.MAX-CLIQUE 
Input : G = {V, E) 

1. Set [/ = y and C = 0; 

% Form the closed neighborhood partition % 

2. while [/ / do 

3. Pick V and lei C = {u ^ U : u v}; 

4. Include C in C; 

5. Set U = U\C'; endwhile 

% Define an induced subgraph % 

6. Let G' = {V , E') be an induced subgraph of G that has exactly one 
vertex for every set C" G C; 

% Find a clique of G' that corresponds to the maximum clique of G 

% 

7. Using exhaustive search, find a clique S in G' such that | U„/g5C'[t;']| 
is maximum; 

8. Output Q = \Jyi(zsC'[v']] 



An example of how the graph G' is constructed (in step 6) for a specific graph G is 
shown in Figure [TJ Notice that G has five closed neighborhoods (whereas its intersection 
number is 3), which are shown in dashed squares, so the graph G' has 5 vertices. The 
corresponding clique of G' that maximizes | yJv'&s {u ■ u -h-g v'}\ is 5 = {4, 6}. 




Fig. 1. An example of a graph G and corresponding G' . 



3.1 Analysis of FIND_MAX-CLIQUE 

We first present the following lemma that concerns basic properties of the relation ■H-c- 

Lemma 1. The closed neighborhood relation -h-g is an equivalence relation with the fol- 
lowing properties: 

1. It is an equivalence relation which partitions the vertex set V in equivalence classes 
called closed neighborhoods. 
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2. A closed neighborhood is a clique. Two closed neighborhoods either form a clique, or 
no edge between their vertices exists. 

Proof. (1) The fact that -h-g is an equivalence relation follows directly by its definition. 
Therefore, every vertex belongs to exactly one equivalence class (i.e. exactly one closed 
neighborhood). 

(2) By definition, a closed neighborhood forms a clique. Let now C[, C'2 be two distinct 
closed neighborhoods and let v G C(,u G C2. Suppose that there is an edge {u,v) between 
u and V in G, i.e. u G Ng{v). Consider now any two vertices v' G C[,u' G C2 (including 
V, u). By definition of the closed neighborhood relation, we must have that Ng{v) U {v} = 
Ng(v') U {v'}. Since the close neighborhoods are disjoint, this means that u' G Ng{v'). 
Therefore, either every edge between C[ and C2 appears in G, and C[ U forms a clique, 
or no edge between them exists. This completes the proof. 

□ 

We now prove the following theorem about the correctness of the Algorithm FIND .MAX- 
CLIQUE. 

Theorem 1 (Correctness). FIND_M AX-CLIQUE correctly outputs a maximum clique 
in G. 

Proof. Notice that, by the second part of Lemma [1] and by construction of C, any clique 
S in G' corresponds to a clique U^i^sC'[v'] in G. 

Therefore, we only need to show that a maximum clique Q of G corresponds to a clique 
in G', because then the algorithm will be able to find it in step 7. Equivalently, we need to 
show that there are fc > 1 closed neighborhoods G(, . . . , G^ which constitute a partition of 
Q, that is U^^^Gj- = Q. Indeed, by construction of G', the vertices in G' that correspond 
to these closed neighborhoods will form a clique in G' (any choice of two vertices will be 
connected) . 

To prove the above, let G' be a closed neighborhood that has at least one common 
vertex v' with Q, i.e. u ' G G' H Q. Then, by definition of the relation, every vertex 
u' ^G v' is connected to v' and to all the vertices that v' is connected to (including all 
vertices in Q). Therefore, by maximality of Q, all the vertices in C' must be contained in 
the maximum clique, i.e. G' C Q. Consequently, a closed neighborhood is either entirely 
contained in Q, or disjoint from it. By the first part of Lemma [H we can then partition 
Q using all the closed neighborhoods that have common vertices with Q. This completes 
the proof. 

□ 

The following result relates the running time of Algorithm FIND_MAX-CLIQUE to 
the intersection number of its input graph G. 

Theorem 2 (Efficiency). Let G = {V,E) be a graph with intersection number m. Then 
FIND_M AX- CLIQUE on input G finds a maximum clique in 0(22™+OM min{2™, n}) 
time steps. 

Proof. By definition, since the intersection number of G is m, there is a set of cliques 
C = {Gi, . . . , Gm} that is an edge clique cover of G. For a vertex u G we denote by 

the set of cliques in C that include v. Notice then that if = Su, then not only are 
u and V connected, but they also have the exact same set of neighbors in V\{u,v}, i.e. 
NGiv) U {v} = NGiu) U {u}. 

Given now a specific edge clique cover C, there are at most 2™ different ways in which 
we can construct a set S^. Consequently, there are at most 2*" < n distinct closed neigh- 
borhoods G(, . . . , G2m in G which constitute a partition of the set of non-isolated vertices. 
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Note also that determining whether or not v -h-g u for any two vertices requires 0{n) steps. 
Therefore, steps 2 to 5 needed for partitioning the vertex set V in closed neighborhoods 
in the algorithm require 0(n^ min{2'", n}) time. 

From the above, we also conclude that the number of vertices in G' is at most 2™'. 
Therefore, the time needed to construct G' in step 6 in the algorithm is 0(2^™). Finally, 
there are at most 2^"" subsets of vertices in G', so step 7 in the algorithm takes 0(2^™+^™') 
time. This completes the proof. 

□ 

Note that the algorithm does not need the actual value of the independence number. 
We only use this information for bounding its running time. The following is a direct 
consequence of Theorem [2j 

Corollary 1. Let m < Inlnn be an upper bound on the independence number of an arbi- 
trary undirected graph G onn vertices. Then there is an algorithm that finds the maximum 
clique of G in time O(n^lnn). 

As a final remark, since the intersection number of Gn,m,p 

is at most m (but could be 

even less), the above result also holds for any random instance of the random intersection 
graphs model with at most Inlnn labels. 



Clique number for m = n", < a < 1 



In this section we give a tight bound on the clique number of Gn,m,p when m = n", a < 1. 
A lower bound in the special case where mp'^ is constant, was given in [20]. We considerably 
broaden this range of values to also include vanishing values for mp'^ and also provide a 
tight upper bound. 

We will also assume, without loss of generality, that p = Q . Indeed, when 

p = (y\J^^i by slightly modifying an argument of [3], we can see that Gn,m,p almost 
surely has no cycle of size A: > 3 whose edges are formed by k distinct labels. Therefore, 
the maximum clique of Gn,m,p when p = a ^-^/^^^j is formed by exactly one label. As a 
matter of fact, if Li is the set of vertices that have chosen label i £ M, then the maximum 
clique is equal to Li, where I £ argmaxjg^vi \Li\. Furthermore, since Gn,m,p is chordal whp 
(see Lemma 5 in [3j), the maximum clique can be found in polynomial time. 

We stress out the fact that the techniques employed to provide the algorithmic and 

structural results in [3j cannot be used in the case where p = Q {^yj^^- In particular, 
Gn,m,p is far from perfect, especially in the the case mp = uj{lnn) (which is included 
in the range of values that we study here). An intuitive justification is as follows: when 
mp = a;(lnn), then the size of the label sets of every vertex are highly concentrated around 
their mean value Tfip. Therefore, the stcttistic3,l behavior of Gn^m^p 

is expected to be similar 

to the statistical behavior of uniform random intersection graphs Gn,m,\^ in which each 
vertex selects exactly A = mp labels from Ai. It was proved in [T7] (part (iii) in Corollary 
2), that the size of the maximum independent set when m = n", a < 1 and A = a;(lnn), 
is asymptotically equal to 2(1 — a) '"^"" . Therefore, when mp = a;(lnra), the size of the 

maximum independent set in Gn,m,p will be around {^^^ , so its chromatic number will 

be f2 ^ " ^ ■ However, as can be seen in Corollary S] (which is a direct consequence of 

our main theorem), the size of the maximum clique in Gn,m,p when m = n",a < 1 and 
mp^ = 0(1) is asymptotically equal to up. This is much smaller than the lower bound 
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Q 1^ "™^ j on the chromatic number in the case mp = a;(lnn). Therefore, Gn,m,p is far 
from perfect in this range of values. 

We first provide some concentration results concerning the number of vertices that 
have chosen a particular label and the number of vertices that have chosen two particular 
labels. 

Lemma 2. Let Gn,m,p be a random instance of the random intersection graphs model with 
m = n°',0 < a < 1 and p = f2 ^-y^^^ ■ Then the following hold: 

A. Let Li be the set of vertices that have chosen label i E M. Then 



Pr(3i G M : \\Li\ - np\ > S^/nplnn) < — 0. (1) 
B. Let also Sy denote the set of labels that were chosen by vertex v. Then 



Pr(3f ^ V : \Sy\ > mp + 3-^ mplnm + Inn) — )• 0. (2) 

Proof. For the first part, fix a label i £ A4. Notice that |Lj| is a binomial random variable 
with parameters n,p, i.e. |Lj| ~ B{n,p). By Chernoff bounds, for any i > 0, we have that 

Pr(||Li| -np\>t)<e ^(^^ + e"^. 

Setting t = 3y/ np In n and noting that t = o{np), we then have that Pr(||Lj| — np\ > 
3^/nplnn) < e~^^°" and the lemma follows from Boole's inequality. 

For the second part, fix a vertex v. Notice that \Sy \ is a binomial random variable with 
parameters m,p, i.e. IS^I ~ B{m,p). By Chernoff bounds, for any (5 > 0, we have that 

Pr(|S„|>(l + <5)mp)<(^^^-|(^ 

Setting 5 = ^ (3 V In m + Inn) and using Boole's inequality we get the desired result. 

□ 

Notice that the above lemma provides a lower bound on the clique number. However, 
a clique in Gn,m,p can be formed by combining more than one label. Clearly, a clique Q 
which is not formed by a single label will need at least 3 labels, since 2 labels cannot cover 
all the edges needed for Q to be a clique. In the discussion below, we will provide a much 
larger lower bound on the number of labels needed to form a clique Q of size \Q\ ~ np 
which is not formed by a single label. The following definition will be useful. 

Definition 4. Denote by Ay^^ the event that there are two disjoint sets of vertices Vi,V2 C 
V, where \Vi\ = y and IV2I = x such that the following hold: 

1. All vertices in Vi have chosen some label Iq, i.e. Iq G CiueViSu- 

2. None of the vertices in V2 has chosen lo, i.e. Iq ^ U^gyjS'^. 

3. Every vertex in Vi is connected to every vertex in V2 . 

As a warm-up, we prove the following technical lemma, which is a first indication that 
in a Gn,m,p graph, whp we cannot have y too large and x too small at the same time. This 
lemma will also be used as a starting step in the proof of our main theorem. 
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Lemma 3. Let Gn,m,p be a random instance of the random intersection graphs model 
with m = n",0 < a < 1 and p = Q (^^J and mp^ = 0(1). Then, for any y > 

Proof. Fix a particular label Iq, a subset Vi of the vertices having chosen Iq (i.e. Vi C L/^,) 
and a vertex v ^ Li^ . The probability that v is connected to all vertices in Vi is exactly 

m-l , V 



p(y,,^)=^^(^ J/(l-pr-'^-^(l-(l-pff. (3) 

Indeed, p}^{\ —pY^~^~'^ is the probability that v has chosen specific labels different from 
lo and 1 — (1 — p)^ the probability that a specific vertex in V\ has chosen at least one of 
those labels (so that it is connected to v). 

By Boole's and Markov's inequality we then have that 

VT{Ay^{)<m(^'^^{n-\Li,\)p{Vx,v) (4) 

By Lemma O for any vertex we have that \Sv\ < (l + o(l))mp + lnn whp. Since(l — (1 — 
pf)y is increasing in k and also {'^^^)p''{l-p)"'-^-^ is maximum around mp, we conclude 
that the maximum of (™~^)p^(l - p)"'-^-^{l - (1 - p)'')^ for A; G {1 ... (1 + o{l))mp} is 
attained at some index k' = {1 + o{l))mp. Therefore, 

Pr(A,,i) < m2n(^'^^"') ~ ^)p'='(l - p)— ^='-1(1 - {l-pf)y + o{l) (5) 
<m'^n(^^lj^yi-{l-pf)y+o{l) (6) 
where the o(l) term corresponds to the error term from Lemma [21 Using now the fact 

7 — 1 

that (by the expansion of the natural logarithm) (1 — p)p = ep ^ ^ = e ^j=i j > 
e^"'^^^j=2^^ = e~^^i-p > e^^'"^, for any p — )• 0, we have that 



PiiAy^i) < m^n( ) (1 - e-2™p')2/ + o(l) (7) 



^^A^T I )(l-e-2"^P')J' + o(l) (8) 



l-^^iol - y 

< m2n(|LiJ)l^'ol~s/(l - e-2-P')f + o(l). (9) 

For any y > \Ll^-^ \ (l — o (j^)), we then have that Pv{Ay^i) — >• 0. But by Lemma[2]we have 
also that \LiJ < np (l + o (jj^)), which completes the proof. 

□ 

The above lemma has the following alternative interpretation, which will be useful in 
the sequence: 

Corollary 2. Let G 

n,m,p be a random instance of the random intersection graphs model 
with m = n", 0<a<l,p = f2 ^-y/^^^ o-n-d mp^ = 0(1). Let also Q he a clique in Gn,m,p 
that is not formed by a single label and also \Q\ ~ np. Lf Iq £ A4 is any label chosen by 
some vertex v £ Q, then there is a positive constant c' < such that whp there are at 
least n'^' vertices in Q that have not chosen Iq. 
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1-a , 



Proof. Notice that, by assumption, np = Q{n 2 ). Therefore, for any positive c' < 



2 ' 

we have that n^' = i^]^^^ ■ The result then fohows by Lemma [3l 

□ 

We now strengthen the above analysis by using the following simple observation: For 

(k) 

a set of vertices V2 and k >2, let Sy^ C J\A denote the set of labels that have been chosen 
by at least k of the vertices in V2 ■ Then the probability that every vertex of a set of vertices 
Vi is connected to every vertex in V2 is at most 



p{VuV2)< I \S^y^\p+{l-pf^^^ n U-il-pf"-'^^^) I (10) 

< I n (i-(i-p)i^^'i) I (11) 

Indeed, the first of the above inequalities corresponds to the probability that each vertex 
in V2 either choses one of the labels shared by at least two vertices in V2 , or it is connected 
to all vertices in V2 by using labels chosen by exactly one vertex in V2. 

Lemma 4. Let Gn,m.p be a random instance of the random intersection graphs model 

with m = n°,0 < a < 1, p = Q (^\J and mp^ = 0(1). Let also x = for some 

positive constant e < 1 that can be as small as possible. Then, for any y > np^^'^, where 
< c < is a constant, we have Pic{Ay^x) = o(l). 

(2) 

Proof. Fix a set V2 of x vertices. We first give an upper bound on the size of SyJ . Towards 

(2) 

this end, let X = \SyJ\ and notice that X is binomially distributed with parameters 



2 

2-,2 



m, J? = 1 — (1 — pY — xp{l — pY . Since, by assumption xp — t- 0, we have that p < 
Therefore X is stochastically dominated by a binomial random variable Y ^ B ^m, 
By Chernoff bounds we then have, for any t > 0, 



,.2^2 



Pr(^X> + <e 'V^"^^ (12) 

^ ~ pil+t' ' where e' is a positive constant that can be as small as possible. Since 
mp^ = 0(1), we have that t = u ^ "^^^p ^ . By Boole's inequality then, the probability that 
there is a subset V2 of x vertices that has | Sy^ \ > "^^^p _)_ ^ ig at most 



n^e = 0(1). (13) 

(2) 

Now that we have an upper bound on the size of Sy^ that holds whp, notice that by 
the second part of Lemma [2] and the fact that mp^ = 0(1), whp we have 

n (i-(i-p)'"^')<^=-(i4?i^>)- (14) 
V&V2 

Therefore, by (fTT]l . we have that p{Vi,V2) < (2|5||^|p)l^il. By Boole's and Markov's in- 
equality we then have that 



10 



Pr(^,,.) < m(^^ll'^y^p{Vi,V2) (15) 

<m(^'^^«')n-(2|5g|p)^ + o(l) (16) 

< m(^'^^«'^n- (ip^-^'-^'Y + 0(1) (17) 

where the o(l) term corresponds to the error terms from Lemma [2] and equation ()13p . 
Using now the first part of Lemma [2] and an upper bound for the binomial coefficient we 
have 



/ 8np\ 



y 



y 



Pr(ylj,,x) < m (^^J (p^-^^-^ y +0(1). (18) 

Setting y = np^'^'^, for any positive constant c < j^, we have (y > 1 and also) that 
Pi:{Ay^x) = o(l). This completes the proof. 

□ 

Lemma m has the following interpretation: 

Corollary 3. Let Gn,m,p o, random instance of the random intersection graphs model 
with m = n", < a < I, p = f2 ^y'^^^^ and mp"^ = 0(1). Let also Q be a clique in Gn,m,p 

that is not formed by a single label and also \Q\ ~ up. Then whp, for any label Iq G A4, 
we have that \Q n Lig\ < np^'^'^, where < c < is a constant. 

In particular, if Q is not formed by a single label, then whp it is formed by at least ^ 
distinct labels. 

Proof. By Corollary [51 if Q is not formed by a single label, then given any label Iq £ A4 
which is chosen by some vertex v £ Q, there is a positive constant c' < such that whp 
there are at least vertices in Q that have not chosen Iq. Therefore, we can apply Lemma 
m using any e < ^o^^ specifically, for any such e we have Pt{A i+c j_) = o{l). 

Additionally, this implies that whp if Q is not formed by a single label, it needs at 
least = ^ distinct labels. This is also a lower bound on the number of labels needed 

by a vertex v in order to connect to all vertices in Q. 

□ 

Before presenting the proof of our main theorem, we prove the following useful lemma, 
which states that if a large clique is not formed by a single label, then it must contain a 
quite large clique Q' whose edges are formed by distinct labels. 

Lemma 5. Let Gn,m,p be a random instance of the random intersection graphs model with 
m = n",0 < a < 1, p = [2 and mp^ = 0(1). Let also Q be any clique in Gn,m,p 

that is not formed by a single label and also \Q\ ~ np. Then whp, Q contains a clique Q' 

c 

whose edges are formed by distinct labels and whose size is at least p~^ , for any positive 
constant c < . 

Proof. Let Q' be a subset of Q which is maximal with respect to the following property V: 
"to each pair of vertices u j/^ v in Q' we can assign a distinct label I, such that I £ Suf^S^" . 

(2) 

Consider now the set of vertices W = {w : Sw Sq, 7^ 0}, namely the set of vertices 
that share a label with at least 2 vertices in Q' (note that Q' C W, because every pair 
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of vertices in Q is connected). Since Q' is maximal, the set Q — W must be the empty 
set. Indeed, if z £ Q — W, then (baring in mind that Q is a chque) z can be connected 
to each vertex in Q' using distinct labels, which are also different from those already used 
to connect pairs of vertices in Q'. Therefore, Q' U {z} would also have property V, which 
contradicts the maximality of Q'. 

By Corollary [3] now, we have that \W\ < {Sqjlnp^'^'^, where < c < is a constant. 

Furthermore, by equation (fT3]) . we have that \Sq}\ < "^^^J ^ + i^-J- whp, for any e' > 

that can be as small as possible. Combining the above, and since mp^ = 0{1), we have 
that 

l^^l- (l + o(l))p^'- ^''^ 
Consequently, the requirement Q — W = ^ translates to 



or equivalently 



M 

(1 + o(l))npi 



\Q'\ > xlir-TTi^hT^- (21) 



Baring in mind that \Q\ ~ np and that e' > can be as small as possible, this completes 
the proof. 

□ 

We now present our main theorem. 

Theorem 3 (Single Label Clique Theorem). Let Gn.m,p be a random instance of the 
random intersection graphs model with m = n",0 < a < 1 and mp^ = 0(1). Then whp, 
any clique Q of size \Q\ ~ np in Gn,m,p is formed by a single label. In particular, the 
maximum clique is formed by a single label. 

Proof. We first note that, as discussed in the beginning of section [H when p = a ^y'^^b) ' 
by slightly modifying an argument of [3j (in particular Lemma 5 there), we can see that 
Gn,m,p almost surely has no cycle of size k >3 whose edges are formed by k distinct labels. 

Therefore, the maximum clique of Gn,m,p when p = a ^y'^^^^, is formed by exactly one 
label and our theorem holds. Consequently, we will assume w.l.o.g. for the remainder of 
the proof that p = Q {^yj^^ ■ 

Let Q be a clique of size |(5| ~ np in Gn,m,p- By Lemma [5l if Q is not formed by a single 
label, then Gn,m,p must contain a clique Q' whose edges are formed by distinct labels and 

whose size is at least /3 p~2 , for any positive constant c < By Markov's inequality, 
the probability that such a Q' exists in Gn,m,p is at most 

Indeed, we can choose the vertices in Q' arranged in a line in at most ways. Then we 
can select the labels needed for the k-th vertex to connect to all vertices to its right in at 
most {i^i^ ways and each such label must be chosen by the fc-th vertex, as well as another 
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vertex to its right (hence the term p^^^ in the product). Upper bounding the binomial 
coefficients in the above and using the fact mp^ = 0(1), we get 




Therefore, whp Q' does not exist in Gn,m,p, which completes the proof. 

□ 

Notice that, by Theorem [Sj the maximum clique in Gn,m,p with m = n", 0<a<l 
and mp^ = 0(1) must be one of the sets Li,l € M.. Therefore, the clique number of Gn,m,p 
can be bounded using the first part of Lemma [2j In particular 

Corollary 4. Let Gn,m,p be a random instance of the random, intersection graphs model 
with m = n",0<a< 1, p = Q (^^J and mp^ = 0(1). Then, whp the maximum clique 
Q of Gn,m,p satisfies \Q\ ~ np. 

5 Label Reconstruction 

One of the implications of our main Theorem [3] is that whp we can find the maximum 
clique in Gn,m,p with m = n",0 < a < 1 and mp"^ = 0(1) in polynomial time, just by 
looking at the associated bipartite graph Bn,m,p- In the following algorithm, we denote by 
Li the set of neighbors of label i € 7W in Bn,m,p^ which can be determined in 0{n) time. 

Algorithm MAX-CLIQUE_FROM_LABELS 

Input: Bn,m,p 

1. Set Q = 0; 

2. for z = 1 to m do 

% Check if the clique induced by label i is larger % 

3. if \Li\ > \Q\ then set Q = Li] endf or 

4. Output Q; 



By Theorem [3l when m = n°^,0<a<l and mp^ = 0(1), Algorithm MAX- 
CLIQUE_FROM_LABELS returns the maximum clique of Gn,m.p whp, in 0{nm) time. 
Therefore, the randomness of the model works in our favor for this case. Indeed, since 
any graph can be written as an intersection graph with at most (2) labels, the problem of 
finding a maximum clique in a graph, given its label representation remains NP-complete. 
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Furthermore, it remains hard even when the intersection number is n°,0 < a < 1 unless 
the exponential time hypothesis fails (see e.g. 0]). 

This leads to the following natural question: Could one infer any information about 
the structure of the associated bipartite graph when provided with Gn,m,,p (i-e. the vertices 
and the edges of the graph)? Notice here that a graph Gn,m,p can correspond to more 
than one associated bipartite graphs. However, we show here that the problem of finding 
the associated bipartite graph given Gn,m,p and the actual values of m, n and p is solvable 
whp when the number of labels is less than the number of vertices; namely, the maximum 
likelihood estimation method will provide a unique solution (up to permutations of the 
labels). More specifically, if Bn,m,p is the set of non-isomorphic associated bipartite graphs 
that give rise to Gn,m,p, then there is some B* G Bn,m,p such that Y'i{B*\Gn,m,p) > 

Pv{B\Gn,m,p),^OIallBn,m,p^B^B*. 

Theorem 4. Let Gn,m,p be a random instance of the random intersection graphs model 
with m = n", 0<a<l, p=fi ^y'^^^) ^^'^ "^^^ ~ Then, whp the bipartite graph 

Bn,m,p associated to Gn,m,,p is uniquely determined, up to permutations of the labels. 

Proof. Let Lj denote the set of vertices that have chosen label i G M. Given Gn,m,p, 
we will refer to {Li,i G M} as its label representation. Notice then that the associated 
bipartite graph is uniquely defined by the label representation. 

Suppose now for the sake of contradiction that {L[^\i G Ai} and {L^^\i G Ai} 
are two distinct label representations (up to permutations of the labels) of Gn,m,p, where 

m = n",0 < a < 1, p = f2 and mp"^ = 0{1). Notice that, by the first part of 

LemmaO whp both of these label representations should satisfy ~ np, for any i G A4. 
and C = 1,2. 

Notice then that there must be a label /, such that ^ {L^^\i G A4}, i.e. the clique 
induced by label / can be edge covered by more than one other cliques of size asymptotically 
equal to np. However, by Theorem [Sj whp no clique Q of size \Q\ ^ np can be formed by 
more than one labels, which contradicts the assumption that ^ {L^f^ ,i G A^}. 

Consequently, {Lf\i G M.} and {L^f^ ,i G 7W} must be similar, up to permutations 
of the labels, i.e. lJ^^ g {Lf\i G M}, for every I G M. This completes the proof. 

□ 

Notice that the uniqueness of the bipartite graph can also be proved in the case where 

p = (^\J . Indeed, in this case Gn,m,p almost surely has no cycle of size > 3 whose 

edges are formed by k distinct labels (see also the beginning of Section Therefore, every 
clique of size at least 3 is formed by a single label and so the proof of Theorem H] applies 
in this (sparser) case also. 



6 Conclusions 



In this work, we studied the maximum clique problem by relating it to the intersection 
number of the input graph. In particular, we first proved that if the intersection number 
of the graph G is sufficiently small, then a simple algorithm can find a maximum clique in 
G in polynomial time. We then considered random instances of the random intersection 
graphs model as input graphs. In particular, by proving the Singe Label Clique Theorem, 
we provided new, more general and asymptotically tight bounds for the clique number 
of Gn,m,p when m = n°,a < 1. We also claim that our proof carries over for a < 2, 
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provided there is a lower bound on p (in particular, we claim that our analysis can be 
applied also for mp^ = ©(I)). One of the consequences of our theorem is that we can 
use the label representation of Gn,m,p to find a maximum clique in polynomial time whp. 
This raised the question of whether wc could reconstruct the label choices of the vertices 
in Gn,m,p given only the graph structure. We proved here that the label reconstruction 
problem is solvable whp when the number of labels is less than the number of vertices. 
Finding efficient algorithms for constructing such a label representation is left as an open 
problem for future research. In view of the equivalence results between random intersection 
graphs and Erdos-Renyi random graphs, we expect that our work will shed light also in 
the problem of finding maximum cliques for input graphs generated by the latter model. 
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